Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ed's pet PR with no flake retries #17831

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

edsantiago
Copy link
Member

This should make system tests work with the requested DB

Signed-off-by: Ed Santiago santiago@redhat.com

None

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2023
@edsantiago
Copy link
Member Author

Debian Build barfing with:

time="2023-03-17T14:06:47Z"
    level=error
    msg="reading system config \"/etc/containers/containers.conf\":
       decode configuration /etc/containers/containers.conf: toml: line 4: Key 'engine' has already been defined."

...but all other Build tasks (Fedora) pass. What is different about Debian TOML parser? And if Debian is different in this regard, what other Debian differences are there, and maybe one of those explains #17607?

@edsantiago
Copy link
Member Author

Oh, never mind; this is probably because Debian gets 'engine ... = runc'. TOML is annoying.

@edsantiago edsantiago force-pushed the ci_sqlite branch 2 times, most recently from 75d88a8 to d2c2cf4 Compare March 17, 2023 16:01
@edsantiago edsantiago marked this pull request as draft March 17, 2023 16:01
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2023
@edsantiago
Copy link
Member Author

It works! It works! Lots of failures:

Error: adding container id to database: database is locked

...but these happen in int tests, too, except that those get retried, these don't. @mheon PTAL.

@edsantiago
Copy link
Member Author

Oh, and @vrothberg, this minimizes the need for an envariable

@mheon
Copy link
Member

mheon commented Mar 17, 2023

The bash here scares me, but LGTM

@mheon
Copy link
Member

mheon commented Mar 17, 2023

# Error: adding container id to database: database is locked

That's new

@edsantiago
Copy link
Member Author

@mheon no, what I mean is, what do we do with this? Obviously we can't enable it, because sqlite isn't ready. Are these failures helpful to you? (And no, "database is locked" is not new, look at int tests logs for any PR this week. It's all over the place, just, int tests get retried so these are not usually hard failures.)

@mheon
Copy link
Member

mheon commented Mar 17, 2023

@edsantiago My understanding was that it is ready, and all tests should be passing. The fact that we're flaking frequently is news to me, but I haven't been spending much time in PRs this week.

@mheon
Copy link
Member

mheon commented Mar 17, 2023

Per SQLite docs it looks like internal concurrency issue across the same DB connection, which I thought their locking would handle... Looking further.

@edsantiago
Copy link
Member Author

Just looking at the very first int log I could find today, here ya go, search in-page for "database is locked". (Weird, though, those tests pass. The "UNIQUE constraint failed" errors are hard though. I assume you know all about those.

@mheon
Copy link
Member

mheon commented Mar 17, 2023

We have some pretty clear concurrency issues on display here. The UNIQUE constraint bits look like a DB creation race that I thought would be impossible, but should be fairly easy to code around. The DB locking one is far more problematic. @vrothberg From my reading here, we may need to start using separate DB connections within the same Podman process for each access, or otherwise introduce some sort of locking to prevent more than 1 access to the DB at a time, because the locking on the connection doesn't seem to be sufficient to prevent this sort of thing.

@edsantiago
Copy link
Member Author

I'm sorry. These errors are everywhere, constant, so I genuinely thought they were part of the WIP sqlite migration. Didn't even bother to file flakes for them. For the locking, would transactions help?

@mheon
Copy link
Member

mheon commented Mar 17, 2023

The issue appears to be that SQLite is allowing a write transaction to fire at the same time as another thread is reading the same table, resulting in errors. I don't think we want to wrap all reads in transactions for perf reasons, unfortunately.

The confusing bit is that this seems to contradict the sqlite docs, which say that by default all accesses through a single connection are sequential only. Simplest fix might just be added a mutex to control access and ensure that remains true.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing against your perl script but maybe it is not very well know: containers.conf supports the common .d config directory layout.
So instead of throwing this all in /etc/containers/containers.conf create a new file in /etc/containers/containers.conf.d/ with this entry. Seems much simpler to me.

@edsantiago
Copy link
Member Author

@vrothberg and I spent a while discussing conf options this week, including the .d possibility. The perl script isn't needed for that... but it will be needed at some point for testing containers.conf overrides. I'll follow up on Monday.

@vrothberg
Copy link
Member

Using .d should "just work" in CI even when a test sets a custom containers.conf via CONTAINERS_CONF.

@vrothberg
Copy link
Member

Using .d should "just work" in CI even when a test sets a custom containers.conf via CONTAINERS_CONF.

Actually not. Whenever CONTAINERS_CONF is set, only this file will be loaded and all system/user configs will be ignored.

@rhatdan
Copy link
Member

rhatdan commented Mar 20, 2023

We've been talking about CONTAINERS_CONF_D or something where all of the normal CONTAINERS_CONF processing would be done, but the value of then environment variable would be processed at the end, perhaps it is time to add that.

@vrothberg
Copy link
Member

We've been talking about CONTAINERS_CONF_D or something where all of the normal CONTAINERS_CONF processing would be done, but the value of then environment variable would be processed at the end, perhaps it is time to add that.

I am working on that at the moment :^)

Signed-off-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Ed Santiago <santiago@redhat.com>
...it's useless clutter.

Signed-off-by: Ed Santiago <santiago@redhat.com>
There is no new version yet but we like to use the new code[1] to debug
a flake[2] in the podman CI. It will not fix it but the new error might
give us a better idea what is going on.

[1] checkpoint-restore/go-criu#175
[2] containers#18856

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. machine release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants